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Airports constitute some of the most complex systems enabling mobility in today’s society. Tbe various components 
inside the airport system each have specific requirements and include numerous systems, processes and stakeholders. 
Stakeholders are comprised of government entities (security and customs), private bodies (airport owner, airlines) and 
customers (passengers and cargo). 

An airport determines the traveler’s first and last impression of a city. A positive airport experience is beneficial to 
sales and influences future travel choices (cite airport council international customer service). Airports have taken steps 
increase their customer focus. An improved customer experience relies on technologies supporting better service. Two 
f-^xamples of such technologies are Radio-Frequency Identification (RFID), which would enable airports to track passengers 
C^nd bags effectively [1] [2], and Bluetooth [3] to support passenger tracking. To facilitate the needs of airport customers and 
ti^erators, such technologies need to be part of the activities of airport users at the airport. While at the airport, passengers 
Engage in processing and discretionary activities. Processing activities are enforced to conform to the legal and regulatory 
■Requirements for air travel. They correspond to : check in, departure paperwork fill out, going through identity and security 
^checkpoints, boarding and deboarding a plane. Passengers actually spend a small portion of their time at airports engaging 
processing activities, including time spend waiting to be processed [4]. 

,__i A common feature of service systems is that the demand for service varies throughout the day. 

|T*] Air terminal queues [5] 

Q Staffing requirements are part of the design and management of the service system. In the long term planning horizon, 
jy^anagers set the system capacity. On the short term horizon, managers make agents scheduling decisions, indicating the 
Clumber of agents working during specific hours, and breaking down the day in time intervals. 

The scheduling decision is often made based on the solution of an integer linear program [6], [7], [8]. In real time, 
^Tfianagers may make additional adjustements (flexing decisions) to move agents on and off the line of duty. This can be 
^^ichieved if there are additional agents on site working on other tasks or if more agents can be called on short notice. 

(C Robertson et al. [9] provided a detailed procedure to model passenger arrivals to estimate how many passengers arrived 
OQt the airport during each day and time of day. The raw passenger volume for each time interval was the final product and 
^Corresponded to the passenger arrival pattern. Further analysis provided access to passenger arrival patterns at different 
^^rocessing points (check-in, baggage security, security checkpoint...). The passenger arrival pattern for each checkpoint 
^jvas computed using several inputs : passenger arrival behavior, flight schedules, aircraft capacity, load factors and transfer 
l/Tjates. 

In a system where congestion can build up at peak hours. The number of servers is dynamically adjusted according to 
^ueue length. If the queue reaches an upper threshold, additional servers are opened. If some servers are idle, they get 
k^losed. By choosing appropriate thresholds, the queue length can be controlled in a certain range with high probability. 
?-This staffing policy is called congestion-based staffing [10]. 

a 


A. Literature Review 


A growing number of airports are providing Wi-Fi access to their passengers. Hence large volumes of signals from 
laptops, tablets and smart phones are picked up at the airport. By design, most Wireless devices aim at saving battery and 
therefore only periodically connect to the Wi-Fi network access. This leads to a set of data with discrete time location 
snapshots, and not a continuous set of location points of the device. In the 1990’s, Lemer [11] stressed the need to develop 
performance measures at airports for different stakeholders, such as operators, airlines and passengers, who each have 
their set of measures. It suggested that queues, simulation and flow methods were adapted to modeling airports and hence 
measuring their performance. Using simulations and system dynamics, Manataki et al. [12] modeled airport performance 
according to the staffing numbers required to process passengers and their waiting times. They later [13] surveyed existing 
analytical and simulation tools for airport analysis. They concluded that most analytical models focused on a particular 
area of the airport (e.g. check-in, baggage screening) but few tackled the entire airport terminal operations. Bluetooth has 
recently been used in the SPOPOS project to provide location-based services to passengers and airport operators [3]. From 
the airport perspective, it helped trigger alerts when queues were building up, to help passengers reach their plane on time. 
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Fig. 1: DIMIA count of arrivingpassengers stamps at Immigration in 2012 


I. Model 


A. Data sources 

Three different sources of information were used for the project: 

• The Flight Information Display System (FIDS) dataset contains information on gate, block, estimated and scheduled 
times for departing and arriving flights to Sydney International Airport. The records do not specify at what time the 
Estimated Time of Arrival (ETA) was recorded, nor how many times it was modified. Neither does it include the 
number of passengers on the flight, the time of arrival for an outbound flight or the time of departure for an inbound 
flight. 

The simulation starts with the schedule of flights arrivals obtained from EIDS, and the passenger count estimated 
from the immigration files. 

• Passenger time stamps at immigration were recorded by the Australian Department of Immigration and Multicultural 
and Indigeneous Affairs (DIMIA). The DIMIA datasest consists in all border crossing activities for 2012. Eor any 
passenger, his or her nationality, the time stamp at immigration, his or her origin or destination airport and flight 
number are entered in the database. The historic service rates at immigration can be derived from this information. 
The flight number is used to compute the average number of passengers per flight by matching passengers to flights 
in EIDS. It allows us to generate a distribution of passenger occupancy per flight ID. The dataset is also used to 
determine the service rate at each immigration desk at any hour during any day of the week. Note that each day of 
the week has a specific service rate distributin. 

Since every DIMIA record contains the processing desk ID along with the time of the stamp, and other passenger 
information, the number of unique open desks can be estimated for a given time period. The service rate per desk 
per hour is the ratio of the number of passengers processed by the number of open desks. 

The limitation of the dataset is the inconsistency of the manual recordings. Some entries are missing. At some places 
tail numbers are recorded in place of flight number, if that information is present. Many days in October and December 
are also missing from the data as shown on Eigure 1 

• The airport is equipped with SITA iflow tool [14], which returns anonymous Wi-Ei tracking information. The iflow 
tool consists in a network of more than 400 WiEi access points, 130 people-counters and 50 Bluetooth censors spread 
throughout the terminals [14]. 

Wi-Ei tracking information includes (x,y) coordinates of the devices, the zone(s) assigned to the device by a triangu¬ 
lation algorithms and the time at which the device (e.g. computer, smart phone or tablet) is connected to the network. 
The results can lack precision due to the low accuracy of the triangulation and the low frequency of the signal updates. 
Many devices are observed only a couple of times at the airport at time intervals that can be as large as an hour. A 
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Fig. 2: DWELL count of passengers recorded in Immigration zones in 2012 


TABLE I: Data sources available on Sydney Airport. 


Label 

Description 

Date Range 

Size 

Challenges 

DIMIA 

Passenger time stamps at immigration 
for each border crossing 

Jan. 2012 to May 2013 

15,461,430 
passengers(6,756,997 
arrivals and 8,704,433 
departures) 

Flight information or origin of the 
flight is not present 

FIDS 

Arriving and departing flight informa¬ 
tion including block, scheduled, esti¬ 
mated times and flight number 

Jan. 6th-Dec. Jst 20J2 
excluding May 

578,104(287,447 
arrivals and 290,656 
departures) 

No time of recording 

DWELLL 

Wi-Fi enabled devices tracking data 
for each location(x,y) triangulated zone 
time stamp 

July Ist-December 

J2th 

2,047,235 unique de¬ 
vice IDs (827,474 ar¬ 
rivals and 1,236,372 
departures overlapping) 

noisy information, inaccurate triangu¬ 
lations, unknown sampling 


point is sometimes allocated to multiple neighbouring zones due to large uncertainties in measurements. Lurthermore, 
some days have many entries and others very few. Within the same day, the quality of the data also varies by airport 
zones. Many zones did not have any recording of passengers as illustrated on Table II 

Table I describes the content of the three data sets. The DWELL data lacks information for most of the days in August 
and September, as well as the second half of December as can be observed in figure 2. The walk times are fitted to a 
subset of the days that are contained in our records. 

The DWELL data source is inconsistent between airport zones, see II. Due to the dearth of information for some zones 
of the airport, we decide to model the walk speed of passengers instead of their walkt times. A walk speed distribution is 
computed by dividing the walking times for all gates by the respective distances of these gates to immigration. Ligure 3 
illustrates the distributions of the walk times from three selected gates to immgigration. On figure 4, we can see that the 
shape of the distribution is well preserved for walk speeds. 
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Fig. 3: Walk times from gates in Pier A to immigration. 



Fig. 4: Walk Speeds Distribution from all gates 
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TABLE II: Number of device ids recorded per zone in 2012 


Zone 

Count 

Zone 

Count 

AQIS PIER C 

0 

Outbound-immigration 

476,603 

pierB-gateS 

0 

dep-immi 

287,381 

pierB-gate9 

0 

depart-dutyfree-all 

294,722 

pierB-gatelO 

0 

depart-dutyfree 1 

164,428 

pierB-gate25 

0 

depart-duty free2 

109,532 

pierB-gate30 

0 

depart-dutyfree3 

135,641 

pierB-gate31 

0 

depart-duty free4 

121,599 

pierB-gate32 

0 

depart-immigrationscreening 

0 

pierB-gate33 

0 

depart-foodcourt 

0 

pierB-gate34 

0 

depart-forum- all 

0 

pierB-gate35 

0 

depart-landside 

852,596 

pierB-gate36 

0 

depart-landside-checkin 1 

0 

pierB-gate37 

0 

depart-landside-checkin2 

0 

pierB-east 

130,905 

depart-landside-checkin3 

0 

pierB-north 

0 

depart-sec 

280,048 

pier B Inbound duty free 

0 

depart-staff area 

0 

pierB-North Arrivals 

0 

Departures-Check-in 

883,611 

pierB-East and South Arrivals 

0 

Departures-North-Concourse 

244,207 

pierB-south 

211,429 

Arrivals-Landside-all 

0 

pierC-gateSO 

0 

Arrivals-Gates-08-and-09 

0 

pierC-gateS 1 

0 

Arrivals-Gates-24-and-25 

0 

pierC-gate53 

0 

arrivals-immiB 

98,223 

pierC-gate54 

0 

arrival-immiC 

115,231 

pierC-gate55 

0 

arrivals-PierB-North 

18,460 

pierC-gate56 

0 

arrivals-PierB-south 

17,452 

pierC-gate57 

0 

arrivals-PierB-west 

20,617 

pierC-gate58 

0 

arrivals-PierC-all 

158,126 

pierC-gate59 

0 

Eorum 

532,942 

pierC-gate60 

0 

pier C Inbound duty free 

0 

pierC-gate61 

0 



pierC-gate63 

0 



pierC-all 

430,112 



pier C Arrivals 

0 



pierC-corridor 

70,951 






In our simulation, we use the DIMIA information, by far the most complete and reliable dataset available, to generate a 
disttibution of the number of passengers by flight to complement the FIDS information. The DWELL information is only 
used to get a relative measure of walk time that is independent from the number of passengers. This failure to model the 
dependency of walk time on congestion is one of the weaknesses of the model.The combination of the data sets and their 
cross validation provides a clearer and more accurate picture of passenger flows in the airport. 

B. Theory 

Our model possesses stochastic and dynamic state variables that change after events that occur at discrete time intervals. 
Our state variables are the numbers of passengers located in selected airport zones: at gates, in the immigration queues, 
at the immigration service desks, at check in counters and at landside. All other airport zones are not part of the system 
studied. 

The state variables are both stochastic and dynamic, which constitute the last two requirements of a discrete-event 
simulation. The physical transition from one zone to the next, and the time spent in a zone follow time-dependent probability 
distribution. Changes in passenger count occur in batches, after the arrival or departure of a flight. Lor these reasons, an 
event-based discrete-event simulation was chosen to represent passenger movements behaviour [15]. 

In an event-based simulation, time progresses directly to the next scheduled event. An event for the simulation can be 
the arrival of a flight, the departure of a flight from a gate, an arrival at immigration or a departure from the immigration 
zone for the arrival process. After an event, the state variables are updated. 

Luture Event Lists (LELs) [16] are used to schedule events. They consist in a list of event notices containing the start time 
and duration of a future event such as arrival or departure. 

In the simulation, each passenger arrives at the next service node following an exponential distribution of inter-arrival 
times. The service node includes a queue and a time-varying number of servers. The service is Lirst-Come Lirst-Served 
(LCLS), and the first passenger at the queue is always served first. Upon arrival, if all active desks are busy, the passenger 
is scheduled to be processed by the first open desk. The passenger must wait, and will enter service after the first scheduled 
departure time. If at least one active desk is free and no passenger is waiting to be processed, the passenger is processed and 
transferred to a departure list, where its departure time is computed. If one desk is free and the list of waiting passengers 
is not empty, the first passenger in the queue is scheduled for departure and removed from the queue.. The departure time 
from a desk follows an empirical service rate distribution that varies with time of day and day of the week. 

Queue statistics including departure times, wait times, throughput and queue length can be derived from the state variables, 
and are aggregated into 15 minutes time bins. Waiting times are computed as the difference between the arrival and 
departure time of a passenger. A delay corresponds to the time difference between arrival at the general queue and arrival 
at a given server. The length of a queue is computed as the number of passengers in the queue at the end of a 15 minutes 
time interval. 

Passengers are modelled individually from one queue to another. Passengers travelling together are not treated as a group. 
There is no consideration of the fact that groups of passengers may have larger processing times at the different service 
nodes and larger walk times. Similarly, all passengers are assigned the same priority at the service node, as a generalization 
of the LCLS assumption. No special consideration is being given to Australian nationals as compared to foreigners in the 
current simulation. 

C. Model 

1) Arrivals: As constructed, the model assumes a single path from a given gate to immigration. A passenger goes 
through each zone of the system with probability 1. Time spent in the duty free shops, food courts or restrooms is assumed 
to be accounted for in the walking time distributions. Although all airports vary in the size of these zones and their 
configurations, the overall layout should be common among most airports and easily adjustable to study passenger flows 
at other airports than Sydney. 

Lollowing Kendall’s notation [17], all queues are modelled as M(t)/M(t)/c(t) Lirst-Come-Lirst-Serve (LCLS) queues. The 
interval process is Markovian(Poisson) and the service distribution time is exponential. The time between succesive arrivals 
are independent. The arrival rates follow a Poisson distribution, where the service rates vary depending on the locations, 
the arriving flights and time of day. The service rates follow an empirical distribution for the different sections. Arrival 
and departures times are assumed to be identically independently distributed (i.i.d.). No bound was assigned to the length 
of the queue. The number of servers varies with the staffing level used as a control variable. 


All the servers are assumed to be independent. The model described above can be modified to take into account to the 
existence of different immigration lines, for instance depending on the citizenship of the passenger. This can be achieved 



by dedicating some of the servers to a given type of passengers. To extend the model to the case where servers dedicated 
to national passengers can also serve foreigners if they are empty, some of the queues would become priority queues and 
no longer FCFS. 

For the arrival process, the simulation starts with flights arrivals at gates. Drawing from the DWELL information, the 
arrival times at immigration are computed. Using service rates computed from the DIMIA information, departure times 
are finally computed. The process is illustrated on Ligure 5. 
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Lig. 5: Arrival process data flow 


The inter-arrival times between a given gate and the closest immigration zones are obtained by observing the walks of 
all passengers passing through these gates and arriving to the immigration zone. The DWELL data is queriesd across all 
days to create two tables: one table for all passengers passing through a given gate and another table for all passengers 
passing through immigration. The two tables are joined based on their device ID. The time difference between the last 
record at the gates and the first one at immigration gives us the walk time for one passenger. 

Lew of the passengers could be traced from DWELL data alone. As can be observed in Table III, the dataset is very 
sparse in some zones. This table compares the total number of passengers observed at immigration against the number of 
passengers who were found at immigration and any arrival gate. 


TABLE III: Number of passengers traced by day 



4th Jul. T2 

19th Jul. T2 

15 Aug. T2 

7 Dec. T2 

8 Dec. T2 

10 Dec. 12’ 

Total number of passengers 

81,857 

28,830 

88,195 

87671 

88,195 

27,701 

Number of passengers traced 

0 

0 

44 

773 

546 

747 


The service rate distribution is obtained from a subset of days with large delays at immigration. We use DIMIA information 
to compute the number of passengers at the immigration service nodes at different hours of the days for all days in DWELL. 
Lor each day of the week and time of the day, the days with the worst delays at the immigration service node are selected. 
The days in this set were used to compute the service rate per desk as a function of time of the day. Lor each hour, the 
maximum service rate from that set was kept. The assumption was that during peak demands the servers operated at their 
highest throughput. The number of servers were obtained by looking at specific days immigration data to recreate actual 
operations, then modified to mitigate delays 

The simulation starts with the schedule of flights for a given day taken from LIDS. The block time and the walk time 
distribution are used to determine at what time the passengers reach immigration. The number of servers depends on the 
arrival time at immigration. Several cases can arise. The case when there is at least one unoccupied server. In that situation 
then the next passenger is processed immediately. It can also happen that all servers are busy. The passenger is forced to 
wait for service in the queue. The outputs of the simulation are the length of the queue at any time, the departure time 
from immigration, the time to be served, and the time spent in the queue. 




























II. Analysis 

The propagation of delays inside the airport is examined in order to identify the different ohservahle factors affecting 
it. The analysis was performed hy: 

• Analyzing the impact of flight delays on passengers, based on flight delays and passenger wait times at immigration. 

• Quantifying the effects of queue length on overall capacity, and the saturation of the queue heyond a certain occupancy. 

A. Delays Propagation 

To study the propagation of delays, we need means of measuring the effect of flight delays on passengers. This requires 
knowledge of information from flights, passengers and immigration. For this reason, the analysis was restricted to the 51 
days with recorded data in all three databases. 

The historic records of flights for 2012 are used to extract the daily arrivals of flights, which acts as a demand on our 
system. The demand distribution with respect to time is bimodal. There is a large demand in the morning between 6am 
and noon, and a smaller peak in the afternoon between 3pm and 6pm. The average flight delays are also fully observable 
from that database. An average delay of 26 minutes flight delay for an average of 812 flights per day is observed across 
all days. Because the delays are only the difference between scheduled time of arrival at the gates, and actual arrival times 
at the gates, they encompass en-route and taxi delays. 

The operations at immigrations are directly obtained from the immigration information. Low staffing levels were observed 
around noon across all days. The low staffing period exarceberates the delays on the occasions where a delayed flight 
arrives early in the afternoon. It takes more time for the system to recover from such disruption. 

After analyzing several days, we focus on three days in the dataset to illustrate the different trends in delays propagation: 
Sunday August 12th 2012, Saturday November 10th 2012 and Wednesday July 25th 2012. For each of these days, we 
show the actual and the scheduled arrival flight times and the flight delays per hour of the day for the flights. We study 
the impacts of these delays on passengers by examining the throughput of the immigration services per hour of the day, 
together with the staffing levels. 



1) August 12th: August 12th 2012 is a characteristic of most days in the dataset. The flights arrival times on Figure 6a 
show that most flights arrive within 15 minutes of their scheduled arrivals. The exception is at 5am see Figure 6h, when 
three flights (AF8098, 1B7705 and QF2) scheduled to arrive at 5:15PM are subject to a 9 hours delay. As can he seen on 
Figure 6d, the staffing level has been set to accomodate the early stream of flights. The second wave of departures from 
immigration occurs around 8pm, see Figure 6c. That higher throughput indicates that more passengers are waiting to be 
served, and possibly that the immigration services are still processing passengers from flights that have arrived between 6 
and 7 pm. 



(a) Flight Arrival Times on August 12th (731 arrivals) (b) Flight delays in minutes on August 12th 
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(c) Actual Departure Times from Immigration on August 12th 



(d) Hourly Staffing levels on August 12th 









B. November 10th 


The distribution of flight arrivals on November 10th on Figure 6e exhibits the bimodal trend mentioned above. With 693 
arrivals compared to 731 on August 12th, the number of flights on November 10th is lower than the number of arrivals 
on August 12th. The flight delays are larger on August 12th. Since the throughput rates are lower, it indicates a high 
variability in passenger arrivals that is not accounted for by the actual staffing levels. 

Figure 6h shows an augmentation in the number of open desks. Yet that reaction is not adequate to respond to the demand. 
The data clearly highlights a lack of predictability in demand affect passengers service, and results in over or understaffing 
atr different times of the day. 
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(e) Flight Arrival Times on Nov 10 (693 arrivals) 



(f) Flight delays in minutes on November 10 
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(g) Actual Departure Times from Immigration on November 10 
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(h) Hourly Staffing levels on November 10 



























C. July 25th 

July 25th was the day with the most flight delays in the dataset. The average delay per flight was over an hour. Figure 6j 
shows the average flight delay for all airport arrivals for each 15 minutes time period. There are 9 periods with observed 
delays greater than 2 hours. 

Most flight delays occured between 6 and 10 am, the high demand period of the airport, as illustrated on Figure 6i. 
It appears from figure 61, that the number of open desks were increased in anticipation of the demand. However the 
throughput is lower than on November. This could be due to a saturation of the immigration services, faced with a larger 
demand. The phenomenon of saturation is presented in the next section. 
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(i) Flight Arrival Times on July 25 (794 arrivals) 
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(j) Flight delays in minutes on July 25 
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(k) Actual Departure Times from Immigration on July 25 
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(1) Hourly Staffing levels on July 25 


Passengers service can suffer from large flight delays. With many delays, there is more uncertainty in the actual arrival 
times of the flights, and therefore a risk of overstaffing and understaffing key positions such as the immigration service 
desks. As will be shown in the next section, operating at maximum demand also has negative effects. It forces the service 
system to operate near saturation. Therefore the service system is be unable to satisfy demand, and throughput is reduced. 


































D. Queue saturation 

The passenger queue saturates after reaching a certain occupancy. The queue is said to saturate when the throughput 
stop increasing with an increase in demand. Passenger demand is the number of passengers in the immigration queue 
which is determined hy the arrival rate. This limit point can then he used in a threshold control policy, which goal would 
he to prevent the queue length from exceeding the saturation point. It can he done strategically using predictions based on 
historic demand, and tactically by adapting to day to day operations. The ability to predict passenger demand based on 
flight schedules is crucial for a tactical strategy. 

To obtain the throughput at immigration, the service rate of passengers at the immigration desks is used. The demand 
is computed from the number of passengers arriving at immigration derived from DWELL. After aggregating those two 
statistics for the whole year, we generate the throughput versus demand curve. Ligure 6 clearly illustrates the demand 
saturation occuring after the queue reaches 280 passengers. 



Lig. 6: Throughput vs. demand 








































































III. Calibration for the queueing model 

Passengers arrive at immigration with a rate A and depart with a rate fi. Arrival times and departure times are parametrized 
using the average walk speed from gates to immigration, and the observed throughput at immigration respectively. 


A. Modeling interarrival times from walk speed 

The average walk speed is defined as the average speed of a passenger going from one gate to immigration. The speed 
is computed based on all interarrival times between gates and immigration from August 2012 to October 2012. In the 
conversion from time to speed, the shortest distance from gate to the closer immgigration zone is used. Because of the 
lack of granularity in the data, it is to be noted that the constructed distribution encompasses the walk to immigration, any 
sightseeing or wandering in between zones, and presumably some time spent deboarding the airplanes. 

1) Calibration process: There is some uncertainty associated with the location of a device. A device is often assigned 
to multiple zones, making it difficult to compute the exact time spent between the gates and immigration for any single 
device. As explained above, out of more than 80,000 devices observerd in a day less than 500 devices can be used to 
extract a path to immigration. As a consequence of the lack of datapoints, the interarrival times for some gates cannot be 
computed. To obtain accurate arrival rates for all the gates, we choose to use walk speed instead of walk times to model 
for the airport. Arrival times are computed solely based on the distance to the immigration services. 

2) Walk Speeds Distributions: When fitting for different distributions, we find that walk times follow an exponential 
distribution as can be seen on Figures 7 and 8. 

The walk speed distributions for individual gates contain multiple modes. For gate 53 on figure 9, as many as 6 different 
modes can be observed. 

We use a two-step process to capture the different modes and model the distribution as a mixture of logistic distributions: 

1) Datapoints are clustered into differrent components using a nomparametric Expectation Maximization algorithm [18]. 
Using the posterior probabilities for each datapoint, we assign a point to a cluster if the posterior probability is higher 
than 0.05. This ensures that we account for the contribution of all components to a given walk speed. It implies that 
some of the clusters are overlapping, in agreement with what is observed in Figure 9. 

2) A distribution is fit to each cluster, and the fit is evaluated using Aikake’s Information Criterion(AIC). The logistic 
distribution was picked due to its finite support, and high Goodness of fit value as seen on Table IV 
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Fig. 7: Walk Times for gate 54 























3) Walk Speeds Fit: Based on the shape of the distribution, multiple possible fits are possible for the different mode. 
One is a standard lognormal that only takes into account the first mode. We also fit the data to a Gaussian and a Lognormal 
mixture model, to compare the performance. The results of the different fits are summarized in Table IV. 


gatehO 



-5000 


5000 10000 

Time[s] 


15000 20000 


Fig. 8: Walk Times for gate 60 
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Fig. 9: Walk Speed for gate 53 


























































When the walk speed data is aggregated, the secondary modes computed become less prevalent as shown on Figure 4. 
Since the first mode dominates all the others, we can ignore any mixtures, and treat passengers walk speeds as a single 
distribution. This may not be the most general solution, as the locations of the different distributions vary widely for 
different gates as observed in Table VI 

We instead opt for a mixture model to describe the walkspeed. To build the model, we use the two-step process described 
previously on the aggregated walk speed data for all gates.. The resulting distributions appear on figures 12. 



Fig. 10: Walk Speed for gale 54 
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Fig. 11: Walk Speed for gate 60 




















































The error in the model is measured hy looking at the mean and the standard deviation for the model, for the data 
between August and December excluding the month of September. 

To calibrate our distribution, we compare the parameters of the walk speeds distributions in Table VI. 

The service time at immigration is estimated per individual desk. It allows the use of the same service rate throughout 
the day independently of the number of active desks. This enables the use of a control scheme with the staffing level as 
a control parameter. 

We use the highest achieved service rates per desk as the service rate per desk. To compute it, 10 days between August 
and October 2012 are selected from the DWELL database, with the longest time spent at immigration. Lor those days, 
the throughput at immigration is calculated from DIMIA, and the service rate computed by dividing the throughput by the 
number of active desks per 15 minutes. 

B. Service Rate 

In order to scale the service rate with the number of open desks, we have built the model of the service rate /rfor an 
individual server. To obtain that service rate, the throughput at immigration for hours with an average wait time longer 
than 15 minutes are recorded. Lor these times, the service rate is computed by dividing the throughput by the number of 
active desks. The result is the empirical distribution on Ligure 13. The service rate ii{t) is then defined as the number of 
open desks at a given hour multiplied by a random number generated from this distribution. 

The actual service rate is consistent with the pattern of flight arrivals, as can be seen on figure 14. There is a first peak 
in the staffing level around 8 am followed by a second peak around 7 pm. It is to be noted that on several days, there is 
no record of any passenger crossing immigration around 1 pm. It is assumed that at that time, a minimum staffing level 
is maintained which would be the lowest staffing of the day. 
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Lig. 12: Walk Speed for all the gates 
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Fig. 13: Service Rate distribution per desk 
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Fig. 14: Number of Open Desks on November 26th 2012 
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TABFE IV: Results of the goodness of Fit test for gate 53 


Component 

Logistic 

Lognormal 

Gamma 

1 

1133.521 

807.2598 

810.1791 

2 

161.0323 

156.8922 

158.1537 

3 

1133.521 

807.2598 

810.1791 

4 

98.04888 

103.859 

101.8572 


TABFE V: Clusters information for gate53 



1.26 

3.16 

0.638 

6 


0.0999 

0.7558 

0.0627 



Mean 

Mixture Coefficient 























TABLE VI: Distribution of Walking Speeds per gate 


Gate 

Mean 

STD 

All 

0.13765 

1.01290 

Gate 53 

-0.33383 

1.03455 

Gate 54 

0.00850 

0.92939 

Gate 55 

0.03031 

0.98285 

Gate 56 

0.29650 

0.96464 

Gate 58 

0.10605 

1.01170 

Gate 59 

0.36459 

0.96816 

Gate 60 

0.43726 

0.92983 

Gate 61 

0.29972 

0.90701 




IV. Results of the simulation 


In this section, the results from the simulation are analyzed and validated against estimated wait times and queue length 
data obtained from DWELL. The wait time is defined as the time spent in the queue hy a passenger from his or her arrival 
at the immigration zone to the beginning of service. The queue length is measured as the number of passengers left in the 
queue as a customer leaves the server. 

12 days were simulated. Out of these days, 2 had an unstable queue, that grew unbounded as the arrival rate increased 
during the day. 

A. July 25th 

As seen on figure 6i and figure 6j, most flights were on time on July 25th, except for a few morning flights who were 
late by almost 10 hours. Due to this delay, we expect service to be punctual in the morning as less passengers than expected 
present themselves at immigration, but slow in the afternoon. This is mostly what we observe in the predicted and actual 
delays at immigration. When comparing predicted delays to the delays information derived from DWELLon figure 15, we 
can see fhaf the simulation agrees with the actual wait times except for the large peak occuring before 3pm. This is due 
to a low number of open desks in our model. 
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Lig. 15: Average time spent in queue by a passenger on July 25th. 

The length of the queue on figure 16 is low in the morning due to the high number of open desks available in the 
morning. It increases in the afternoon due to a decrease in the number of open desks, and the late arrivals of passengers 
from delayed flights. The change in queue length is not as dramatic as the rise in wait times in the afternoon. It indicates 
that despite having larger delays in the afternoon, those delays affect few passengers. It is to be noted that whenever the 
service rate exceeds demand our model does not predict the formation of any queue. 






Fig. 16: Average number of passengers at immigration on July 25th . 








B. July 26th 

On July 26th, predicted and actual delays remained low for most passengers as observed on figure 17. The model does 
not account at all for the wait times exceeding 100 minutes, and underestimate the waiting times at the begining of the 
day. As for July 25th, wait times occuring during the slow period of the day(2-3pm) are overestimated. The predicted 
queue length on figure 18 is also likely overestimafed. 
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Fig. 17: Average fime spent in queue by a passenger on July 26th. 




Fig. 18: Simulated queue length July 26th. 














C. December 11th 


The model tends to agree with actual wait times on December 11th, see figure 19. It slightly underpredicts delay at 
immigration in the morning, and overestimates it in the afternoon. 

Because staffing levels are lower in the afternoon, the last peak in demandoccuring at 8pm provokes longer queues as seen 
on 20 
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Fig. 19: Average time spent in queue hy a passenger on December 11th. 



Fig. 20: Simulated queue length December 11th. 

For all simulation results that were compared to the actual wait times, we observed that the simulated wait times were 
largely higher around 2pm than the actual results. It can be attributed by an error in the number of recorded desks at this 
time. For some of the days, there is no immigration records between 1:30pm and 2:00pm. As explained in our analysis, 
only few of the data points can be used to obtain the time spent at immigration. This means that our actual wait times are 
probably a lower bound on the actual delays at immigration. 
















V. Conclusion 

In this paper, we have considered the prohlem of modeling the arrival process of passengers at the immigration services 
of an international airport. 

In our analysis, we have performed an investigation of the factors affecting passengers delays at immigration. We have 
generalized the notion of passener walk time to a model that is independent of the gate of origin, hy using mixture models. 
Our model has been validated against a year of operational data. 

Further research, would he on how to extend the model to other areas of the airport, and how to refine Wi-FI information 
to obtain finer passenger location estimates. 
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